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Computed by Spreading Activation on an English Dictionary 



On 
m 

(N 



> 

O 
O 

o 

ON 

I 



X 
S3 



Hideki Kozima 

Course in Computer Science 
and Information Mathematics, 
Graduate School, 
University of Electro-Communications 
1—5-1, Chofugaoka, Chofu, 
Tokyo 182, Japan 
(xkozima@phaeton.cs . uec . ac . jp) 



Abstract 

This paper proposes a method for measur- 
ing semantic similarity between words as 
a new tool for text analysis. The simi- 
larity is measured on a semantic network 
constructed systematically from a subset 
of the English dictionary, LDOCE (Long- 
man Dictionary of Contemporary English) . 
Spreading activation on the network can di- 
rectly compute the similarity between any 
two words in the Longman Defining Vocab- 
ulary, and indirectly the similarity of all the 
other words in LDOCE. The similarity rep- 
resents the strength of lexical cohesion or 
semantic relation, and also provides valu- 
able information about similarity and co- 
herence of texts. 

1 Introduction 

A text is not just a sequence of words, but it also has 
coherent structure. The meaning of each word in a 
text depends on the structure of the text. Recogniz- 
ing the structu re of text is an essential task in text 
understanding. | Grosz and Sidner, 1986 1 

One of the valuable indicators of the structure 



of text is lexical cohesion. | Halliday and Hasan, 1976 | 
Lexical cohesion is the relationship between words, 
classified as follows: 

1. Reiteration: 

Molly likes cats. She keeps a cat. 

2. Semantic relation: 

a. Desmond saw a cat. It was Molly's pet. 
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b. Molly goes to the north. Not east. 

c. Desmond goes to a theatre. He likes films. 

Reiteration of words is easy to capture by morpho- 
logical analysis. Semantic relation between words, 
which is the focus of this paper, is hard to recognize 
by computers. 

We consider lexical cohesion as semantic similarity 
between words. Similarity is computed by spreading 



activation (or association) | Waltz and Pollack, 1985 



on a semantic network constructed systematically 
from an English dictionary. Whereas it is edited by 
some lexicographers, a dictionary is a set of asso- 
ciative relation shared by the people in a linguistic 
community. 

The similarity between words is a mapping a: Lx 
L — > [0,1], where L is a set of words (or lexicon). 
The following examples suggest the feature of the 
similarity: 

er(cat, pet) = 0.133722 (similar), 
cr(cat, mat) = 0.002692 (dissimilar). 

The value of o~(w,w') increases with strength of se- 
mantic relation between w and w'. 

The following section examines related work in or- 
der to clarify the nature of the semantic similarity. 
Section 3 describes how the semantic network is sys- 
tematically constructed from the English dictionary. 
Section 4 explains how to measure the similarity by 
spreading activation on the semantic network. Sec- 
tion 5 shows applications of the similarity measure — 
computing similarity between texts, and measuring 
coherence of a text. Section 6 discusses the theoret- 
ical aspects of the similarity. 
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Figure 1. A psycholing uistic measure ment 
(semantic differential [Osgood, 1952]). 




2 Related Work on Measuring 
Similarity 

Words in a language are organized by two kinds of 
relationship. One is a syntagmatic relation: how the 
words are arranged in sequential texts. The other is a 
paradigmatic relation: how the words are associated 
with each other. Similarity between words can be 
defined by either a syntagmatic or a paradigmatic 
relation. 

Syntagmatic similarity is based on co-occurrence 
data extracted from corpora 



Church and Hanks, 199C], definitions in dictionaries 



Wilks et al., 1989 1, and so on. Paradigmatic sim- 



ilarity is based on association data extracted from 



thesauri [Morris and Hirst, 1991], psychological ex- 
periments [Osgood, 1952], and so on. 

This paper concentrates on paradigmatic similar- 
ity, because a paradigmatic relation can be estab- 
lished both inside a sentence and across sentence 
boundaries, while syntagmatic relations can be seen 
mainly inside a sentence — like syntax deals with 
sentence structure. The rest of this section fo- 
cuses on two related works on measuring paradig- 
matic similarity — a psycholinguistic approach and 
a thesaurus-based approach. 

2.1 A Psycholinguistic Approach 

Psycholinguists have been proposed methods for 
measuring similarity. One of the pioneering works 
is 'semantic differential' [ psgood, 1952 which anal- 
yses meaning of words into a range of different di- 
mensions with the opposed adjectives at both ends 
(see Figure 1), and locates the words in the semantic 
space. 

Recent works on knowledge representation are 
somewhat related to Osgood's semantic differential. 
Most of them describe meaning of words using special 
symbols like microfeatures [Waltz and Pollack, 1985, 



Hcndler, 1989 that correspond to the semantic di- 
mensions. 

However, the following problems arise from the 
semantic differential procedure as measurement of 
meaning. The procedure is not based on the deno- 
tative meaning of a word, but only on the connota- 
tive emotions attached to the word; it is difficult to 
choose the relevant dimensions, i.e. the dimensions 
required for the sufficient semantic space. 

2.2 A Thesaurus-based Approach 

Morris and Hirst [1991] used Roget's thesaurus as 
knowledge base for determining whether or not two 
words are semantically related. For example, the 
semantic relation of truck/car and drive/car are 
captured in the following way: 

1. truck 6 vehicle ^ car 

(both are included in the vehicle class), 

2. drive 6 journey — > vehicle 9 car 
(journey refers to vehicle). 

This method can capture almost all types of se- 
mantic relations (except emotional and situational 
relation), such as paraphrasing by superordinate (ex. 
cat/pet), systematic relation (ex. north/east), and 
non-systematic relation (ex. theatre/f ilm). 

However, thesauri provide neither information 
about semantic difference between words juxtaposed 
in a category, nor about strength of the semantic re- 
lation between words — both are to be dealt in this 
paper. The reason is that thesauri are designed to 
help writers find relevant words, not to provide the 
meaning of words. 

3 Paradigme: A Field for Measuring 
Similarity 

We analyse word meaning in terms of the seman- 
tic space defined by a semantic network, called 
Paradigme. Paradigme is systematically constructed 
from Glosseme, a subset of an English dictionary. 

3.1 Glosseme — A Closed Subsystem of 
English 

A dictionary is a closed paraphrasing system of nat- 
ural language. Each of its headwords is defined by 
a phrase which is composed of the headwords and 
their derivations. A dictionary, viewed as a whole, 
looks like a tangled network of words. 

We adopted Longman Dictionary of Contemporary 
English (LDOCE) [1987] as such a closed system of 
English. LDOCE has a unique feature that each of 
its 56,000 headwords is defined by using the words in 
Longman Defining Vocabulary (hereafter, LDV) and 
their derivations. LDV consists of 2,851 words (as 



red 1 /red/ adj -dd- 1 of the colour of blood 
or fire: a red rose/dress | We painted the door 
red. — see also like a red rag to a bull 
(RAG 1 ) 2 (of human hair) of a bright brownish 
orange or copper colour 3 (of the human skin) 
pink, usu. for a short time: I turned red with 
embarrassment /anger. \ The child's eye (= the 
skin round the eyes) were red from crying. 4 
(of wine) of a dark pink to dark purple colour 
— ^ness n [U] 



(red adj ; headword, word-class 

((of the colour) ; unit 1 — head-part 

(of blood or fire) ) ; det-part 

((of a bright brownish orange or copper colour) 

(of human hair) ) 

(pink ; unit 3 — head-part 

(usu for a short time) ; det-part 1 

(of the human skin) ) ; det-part 2 
( (of a dark pink to dark purple colour) 

(of wine) )) 



Figure 2. A sample entry of LDOCE and a corresponding entry of Glosseme (in S-expression) . 



(red_l (adj) 0.000000 
; ; ref erant 
(+ ; ; subref erant 1 

(0.333333 ;; weight 
(* (0.001594 of. 
(0.042108 colour 
(0.185058 fire. 
; ; subref erant 2 
(0.277778 

(* (0.000278 of. 
(0.466411 orange. 
(0.007330 colour 
(0.016372 hair. 
; ; subref erant 3 
(0.222222 

(* (0.410692 pink 
(0.028846 short 
(0.000595 the 
; ; subref erant 4 
(0. 166667 

(* (0.000328 of_l) 
(0.123290 pink_l) 
(0.000273 to_3) 
(0.141273 purple_2) 
(0.338512 wine_l) 
; ; ref ere 

(* (0.031058 apple_l) ( 
(0.029140 copper_l) ( 
(0.005464 fox_l) ( 

(0.029140 orange_l) ( 
(0.098349 pink_2) ( 
(0.196698 red_2) ( 
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Figure 3. A sample node of Paradigme (in S-expression). 



the headwords in LD OCE) based on the survey of 
restricted vocabulary | West, 1953 1 . 

We made a reduced version of LDOCE, called 
Glosseme. Glosseme has every entry of LDOCE 
whose headword is included in LDV. Thus, LDV is 
defined by Glosseme, and Glosseme is composed of 
LDV. Glosseme is a closed subsystem of English. 

Glosseme has 2,851 entries that consist of 101,861 
words (35.73 words/entry on the average). An item 
of Glosseme has a headword, a word-class, and one 
or more units corresponding to numbered definitions 
in the entry of LDOCE. Each unit has one head- 
part and several defc-parte. The head-part is the first 
phrase in the definition, which describes the broader 
meaning of the headword. The det-parts restrict the 
meaning of the head-part. (See Figure 2.) 



3.2 Paradigme — A Semantic Network 

We then translated Glosseme into a semantic net- 
work Paradigme. Each entry in Glosseme is mapped 
onto a node in Paradigme. Paradigme has 2,851 
nodes and 295,914 unnamed links between the nodes 
(103.79 links/node on the average). Figure 3 shows 
a sample node red_l . Each node consists of a head- 
word, a word-class, an activity-value, and two sets 
of links: a refer ant and a re fere. 

A referant of a node consists of several subreferants 
correspond to the units of Glosseme. As shown in 
Figure 2 and 3, a morphological analysis maps the 
word brownish in the second unit onto a link to the 
node brown_l, and the word colour onto two links 
to colour_l (adjective) and colour_2 (noun). 




red_2\ 
red_l\ 
orange_: 
pmk_r 
pink_2 
blood_l' , 
copper_f 
purple.l 1 
purple 
rosej 

8 10 

T (steps) 

Figure 5. An activated pattern produced from red 
(changing of activity values of 10 nodes 
holding highest activity at T—10). 



A refere of a node p records the nodes referring to 
p. For example, the refere of red_l is a set of links to 
nodes (ex. apple_l) that have a link to red_l in their 
referants. The refere provides information about the 
extension of red_l, not the intension shown in the 
rcfcrant. 

Each link has thickness tk, which is computed 
from the frequency of the word uik in Glosseme and 
other information, and normalized as Y]tk = 1 in 
each subreferant or refere. Each subreferant also 
has thickness (for example, 0.333333 in the first 
subreferant of red_l), which is computed by the or- 
der of the units which represents significance of the 
definitions. Appendix A describes the structure of 
Paradigme in detail. 

4 Computing Similarity between 
Words 

Similarity between words is computed by spreading 
activation on Paradigme. Each of its nodes can hold 
activity, and it moves through the links. Each node 
computes its activity value Vi(T+l) at time T+l as 
follows: 

v{T+l) = <t>(R i (T),R! i (T),e i (T)), 

where B4 (T) and R ■ (T) are the sum of weighted ac- 
tivity (at time T) of the nodes referred in the referant 
and refere respectively. And, ej(T) is activity given 
from outside (at time T); to 'activate a node' is to 
let £i{T) > 0. The output function <p sums up three 
activity values in appropriate proportion and limits 
the output value to [0,1]. Appendix B gives the de- 
tails of the spreading activation. 



4.1 Measuring Similarity 

Activating a node for a certain period of time causes 
the activity to spread over Paradigme and produce 
an activated pattern on it. The activated pattern ap- 
proximately gets equilibrium after 10 steps, whereas 
it will never reach the actual equilibrium. The pat- 
tern thus produced represents the meaning of the 
node or of the words related to the node by morpho- 
logical analysis]]. 

The activated pattern, produced from a word w, 
suggests similarity between w and any headword in 
LDV. The similarity a(w,w') £ [0, 1] is computed in 
the following way. (See also Figure 4.) 

1. Reset activity of all nodes in Paradigme. 

2. Activate w with strength s(w) for 10 steps, 
where s(w) is significance of the word w. 
Then, an activated pattern P(w) is produced 
on Paradigme. 

3. Observe a(P(w), w') — an activity value of the 
node w' in P(w). 

Then, a(w,w') is s(w') •a(P(w),w / ). 

The word significance s(w) G [0, 1] is defined as 
the normalized information of the word w in the cor- 
pus I West, 1953|. For example, the word red ap- 



pears 2,308 times in the 5,487,056-word corpus, and 
the word and appears 106,064 times. So, s(red) and 
s(and) are computed as follows: 



s(red) 



s(and) 



-log(2308/5487056) 
-log(l/5487056) 

-log(106064/5487056) 
-log(l/5487056) 



= 0.500955 



0.254294 



We estimated the significance of the words excluded 



from the word list West, 1953] at the average sig- 
nificance of their word classes. This interpolation 
virtually enlarged West's 5,000,000-word corpus. 

For example, let us consider the similarity between 
red and orange. First, we produce an activated pat- 
tern P(red) on Paradigme. (See Figure 5.) In 
this case, both of the nodes red_l (adjective) and 
red_2 (noun) are activated with strength s(red) = 
0.500955. Next, we compute s(orange) = 0.676253, 
and observe a(P(red), orange) = 0.390774. Then, 
the similarity between red and orange is obtained 
as follows: 



cr(red, orange) 



0.676253 • 0.390774 
0.264262 . 



x The morphological analysis maps all words derived 
by 48 affixes in LDV onto their root forms (i.e. headwords 
of LDOCE). 
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Figure 4 

(1) Start activating w. 



Process of measuring the similarity a(w,w') on Paradigme. 

(2) Produce an activated pattern. (3) Observe activity of w' . 



4.2 Examples of Similarity between Words 

The procedure described above can compute the sim- 
ilarity a(w,w') between any two words w,w' in LDV 
and their derivations. Computer programs of this 
procedure — spreading activation (in C), morpho- 
logical analysis and others (in Common Lisp) — can 
compute a(w,w') within 2.5 seconds on a worksta- 
tion (SPARCstation 2). 

The similarity a between words works as an indi- 
cator of the lexical cohesion. The following exam- 
ples illustrate that a increases with the strength of 
semantic relation: 

cr(wine, alcohol) = 0.118078 , 

cr(wine, line) = 0.002040 , 

<r(big, large) = 0.120587 , 

cr(clean, large) = 0.004943 , 

<r(buy, sell) = 0.135686 , 

cr(buy, walk) = 0.007993 . 

The similarity a also increases with the co- 
occurrence tendency of words, for example: 

cr(waiter, restaurant) = 0.175699 , 

a (computer, restaurant) = 0.003268 , 

cr(red, blood) = 0.111443 , 

cr(green, blood) = 0.002268 , 

cr(dig, spade) = 0.116200 , 

cr(fly, spade) = 0.003431 . 

Note that a(w,w') has direction (from w to w'), so 
that a(w,w') may not be equal to a(w',w): 

a(f ilms, theatre) = 0.178988 , 
er(theatre, films) = 0.068927 . 

Meaningful words should have higher similar- 
ity; meaningless words (especially, function words) 
should have lower similarity. The similarity a(w,w') 
increases with the significance s(w) and s(w') that 
represent mcaningfulncss of w and w'\ 



er(north, 
cr(to, 
cr(f ilms, 
cr(to, 



east) 
theatre) 
of) 
the) 



0.100482 
0.007259 
0.005914 
0.002240 




Figure 6. Measuring similarity of entra words 
as the similarity between word lists. 



er(waiter, 
<r(of, 



waiter) 
of) 



= 0.596803 
= 0.045256 



Note that the reflective similarity a(w,w) also de- 
pends on the significance s(w), so that a(w 7 w) < 1: 



4.3 Similarity of Extra Words 

The similarity of words in LDV and their derivations 
is measured directly on Paradigme; the similarity 
of extra words is measured indirectly on Paradigme 
by treating an extra word as a word list W = 
{wi,- ■ -,w n } of its definition in LDOCE. (Note that 
each Wi e W is included in LDV or their derivations.) 

The similarity between the word lists W, W is de- 
fined as follows. (See also Figure 6.) 

a(W, W) = i> (E» w s(w')-a(P(W), v/j) , 

where P{W) is the activated pattern produced 
from W by activating each Wi E W with strength 
s(wi) 2 / s ( w k) for 10 steps. And, tp is an output 
function which limits the value to [0,1]. 

As shown in Figure 7, bottle_l and wine_l have 
high activity in the pattern produced from the phrase 
"red alcoholic drink" . So, we may say that the over- 
lapped pattern implies "a bottle of wine" . 

For example, the similarity between linguistics 
and stylistics, both are the extra words, is com- 
puted as follows: 

er(linguistics, stylistics) 

= <r({the, study, of, language, in, 
general, and, of, particular, 
languages, and, their, structure, 
and, grammar, and, history}, 




alcohol_l\ 
drink_l\ 
red_2\ 
drink_2\ 
red_l\ 
bottle.l 
wine_f 
poison. I 1 
swallow_l' 
spirit_r 

8 10 

T (steps) 

Figure 7. An activated pattern produced from 
the word list: {red, alcoholic, drink}. 




episodes 

Figure 8. Episode association on Paradigme 
(recalling the most similar episode in memory). 



{the, study, of, style, in, 
written, or, spoken, language} ) 
= 0.140089 . 

Obviously, both a{W 1 w) and a(w,W), where W 
is an extra word and w is not, are also computable. 
Therefore, we can compute the similarity between 
any two headwords in LDOCE and their derivations. 

5 Applications of the Similarity 

This section shows the application of the similarity 
between words to text analysis — measuring similar- 
ity between texts, and measuring text coherence. 

5.1 Measuring Similarity between Texts 

Suppose a text is a word list without syntactic struc- 
ture. Then, the similarity <r(X, X') between two 
texts X, X' can be computed as the similarity of ex- 
tra words described above. 



The following examples suggest that the similar- 
ity between texts indicates the strength of coherence 
relation between them: 

er("I have a hammer.", 

"Take some nails." ) = 0.100611 , 
er("I have a hammer.", 

"Take some apples.") = 0.005295, 
cr("I have a pen. ", 

"Where is ink?" ) = 0.113140 , 
have a pen. ", 

"Where do you live?" ) = 0.007676 . 

It is worth noting that meaningless iteration of 
words (especially, of function words) has less influ- 
ence on the text similarity: 

er("It is a dog. ", 

"That must be your dog." ) = 0.252536 , 
<7("It is a dog. ", 

"It is a log. " ) = 0.053261 . 

The text similarity provides a semantic space for 
text retrieval — to recall the most similar text in 
{X[, ■ ■ ■ , X' n } to the given text X. Once the ac- 
tivated pattern P(X) of the text X is produced 
on Paradigme, we can compute and compare the 
similarity a(X, X[), ■ ■ ■ , a(X, X' n ) immediately. (See 
Figure 8.) 

5.2 Measuring Text Coherence 

Let us consider the reflective similarity a(X, X) of 
a text X, and use the notation c(X) for a(X, X). 
Then, c(X) can be computed as follows: 

cPO = 1> (J2 w ex s(w)a{P(X),w)) . 

The activated pattern P(X), as shown in Figure 7, 
represents the average meaning of Wi EX. So, c(X) 
represents cohesiveness of X — or semantic closeness 
of w G X, or semantic compactness of X. (It is also 
closely related to distortion in clustering.) 

The following examples suggest that c(X) indi- 
cates the strength of coherence of X: 

c ("She opened the world with her 
typewriter. Her work was typing. 
But She did not type quickly." ) 

= 0.502510 (coherent), 

c ("Put on your clothes at once. 
I can not walk ten miles. 
There is no one here but me." ) 

= 0.250840 (incoherent). 

However, a cohesive text can be incoherent; the 
following example shows cohesiveness of the incoher- 
ent text — three sentences randomly selected from 
LDOCE: 



c ("I saw a lion. 

A lion belongs to the cat family. 

My family keeps a pet." ) 

= 0.560172 (incoherent, but cohesive). 

Thus, c(X) can not capture all the aspects of text 
coherence. This is because c(X) is based only on the 
lexical cohesion of the words in X. 

6 Discussion 

The structure of Paradigme represents the knowl- 
edge system of English, and an activated state pro- 
duced on it represents word meaning. This section 
discusses the nature of the structure and states of 
Paradigme, and also the nature of the similarity com- 
puted on it. 

6.1 Paradigme and Semantic Space 

The set of all the possible activated patterns pro- 
duced on Paradigme can be considered as a seman- 
tic space where each state is represented as a point. 
The semantic space is a 2,851-dimensional hyper- 
cube; each of its edges corresponds to a word in 
LDV. 

LDV is selected according to the following infor- 
mation: the word frequency in written English, and 
the range of contexts in which each word appears. 
So, LDV has a potential for covering all the concepts 
commonly found in the world. 

This implies the completeness of LDV as dimen- 
sions of the semantic space. Osgood's semantic dif- 
ferential procedure [1952] used 50 adjective dimen- 
sions; our semantic measurement uses 2,851 dimen- 
sions with completeness and objectivity. 

Our method can be applied to construct a se- 
mantic network from an ordinary dictionary whose 
defining vocabulary is not restricted. Such a net- 
work, however, is too large to spread activity over 
it. Paradigme is the small and complete network for 
measuring the similarity. 

6.2 Connotation and Extension of Words 

The proposed similarity is based only on the deno- 
tational and intensional definitions in the dictionary 
LDOCE. Lack of the connotational and extensional 
knowledge causes some unexpected results of mea- 
suring the similarity. For example, consider the fol- 
lowing similarity: 

octree, leaf) = 0.008693 . 

This is due to the nature of the dictionary defi- 
nitions — they only indicate sufficient conditions of 
the headword. For example, the definition of tree 
in LDOCE tells nothing about leaves: 



tree n 1 a tall plant with a wooden trunk and 
branches, that lives for many years 2 a bush 
or other plant with a treelike form 3 a drawing 
with a branching form, esp. as used for showing 
family relationships 

However, the definition is followed by pictures of 
leafy trees providing readers with connotational and 
extensional stereotypes of trees. 

6.3 Paradigmatic and Syntagmatic 
Similarity 

In the proposed method, the definitions in LDOCE 
are treated as word lists, though they are phrases 
with syntactic structures. Let us consider the fol- 
lowing definition of lift: 

lift v 1 to bring from a lower to a higher level; 
raise 2 (of movable parts) to be able to be 
lifted 3 

Anyone can imagine that something is moving up- 
ward. But, such a movement can not be represented 
in the activated pattern produced from the phrase. 
The meaning of a phrase, sentence, or text should 
be represented as pattern changing in time, though 
what we need is static and paradigmatic relation. 

This paradox also arises in measuring the similar- 
ity between texts and the text coherence. As we have 
seen in Section 5, there is a difference between the 
similarity of texts and the similarity of word lists, 
and also between the coherence of a text and cohe- 
siveness of a word list. 

However, so far as the similarity between words 
is concerned, we assume that activated patterns on 
Paradigme will approximate the meaning of words, 
like a still picture can express a story. 

7 Conclusion 

We described measurement of semantic similarity be- 
tween words. The similarity between words is com- 
puted by spreading activation on the semantic net- 
work Paradigme which is systematically constructed 
from a subset of the English dictionary LDOCE. 
Paradigme can directly compute the similarity be- 
tween any two words in LDV, and indirectly the sim- 
ilarity of all the other words in LDOCE. 

The similarity between words provides a new 
method for analysing the structure of text. It can be 
applied to computing the similarity between texts, 
and measuring the cohesiveness of a text which sug- 
gests coherence of the text, as we have seen in Sec- 
tion 5. And, we are now applying it to text segmenta- 



tion [Grosz and Sidner, 1986, Youmans, 1991], i.e. to 
capture the shifts of coherent scenes in a story. 



In future research, we intend to deal with syntag- 
matic relations between words. Meaning of a text lies 
in the texture of paradigmatic and syntagmatic re- 
lations between words [Hjelmslev, 1943]. Paradigme 
provides the former dimension — an associative sys- 
tem of words — as a screen onto which the meaning 
of a word is projected like a still picture. The latter 
dimension — syntactic process — will be treated as 
a film projected dynamically onto Paradigme. This 
enables us to measure the similarity between texts 
as a syntactic process, not as word lists. 

We regard Paradigme as a field for the interac- 
tion between text and episodes in memory — the 
interaction between what one is hearing or reading 
and what one knows [fcchank, 1990| . The meaning 
of words, sentences, or even texts can be projected 
in a uniform way on Paradigme, as we have seen in 
Section 4 and 5. Similarly, we can project text and 
episodes, and recall the most relevant episode for in- 
terpretation of the text. 

Appendix A. Structure of Paradigme 
— Mapping Glosseme onto Paradigme 

The semantic network Paradigme is systematically 
constructed from the small and closed English dictio- 
nary Glosseme. Each entry of Glosseme is mapped 
onto a node of Paradigme in the following way. (See 
also Figure 2 and 3.) 

Step 1. For each entry Gi in Glosseme, map 
each unit tiy in Gi onto a subreferant sy of the 
corresponding node Pi in Paradigme. Each word 
Wijn £ Uij is mapped onto a link or links in sy , in 
the following way: 

1. Let t n be the reciprocal of the number of ap- 
pearance of Wijn (as its root form) in Glosseme. 

2. If Wij n is in a head-part, let t n be doubled. 

3. Find nodes {p n \,Pn2, •• •} corresponds to ioy„ 
(ex. red — > {red_l, red_2}). Then, divide t n 
into {t n i, t n 2, ■ • •} in proportion to their fre- 
quency. 

4. Add links l n \, l n 2, • • • to sy, where l nm is a link 
to the node p nm with thickness t nm . 

Thus, Sij becomes a set of links: {hji, hji, • ■ •}, 
where lijk is a link with thickness Ujk- Then, nor- 
malize thickness of the links as tijk = 1, in each 

Sij. 

Step 2. For each node Pi, compute thickness hij 
of each subreferant sy in the following way: 

1. Let m, be the number of subrefcrants of Pi. 



3. Normalize thickness hij as • hij = 1, in each 
Pi. 

Step 3. Generate refere of each node in 
Paradigme, in the following way: 

1. For each node Pi in Paradigme, let its refere rj 
be an empty set. 

2. For each Pi, for each subreferant sy of Pi, for 
each link l^k in sy: 

a. Let Pijk be the node referred by Ujk, and let 
Ujk be thickness of Ujk- 

b. Add a new link V to refere ofpijk, where I' is 
a link to Pi with thickness t' = hij-tijk- 

3. Thus, each rj becomes a set of links: 
{^1,^2! ' ' '}> where 1'^ is a link with thickness 
t'ii. Then, normalize thickness of the links as 
2j iij = l, in each r,. 

Appendix B. Function of Paradigme 
— Spreading Activation Rules 

Each node P,; of the semantic network Paradigme 
computes its activity value Vi(T+l) at time T+l as 
follows: 



Vi{T+l)=(j> 



R % {T)+R[(T) 



ei(T) 



where Ri(T) and R'i(T) are activity (at time T) col- 
lected from the nodes referred in the referant and 
refere respectively; Ci(T) G [0,1] is activity given 
from outside (at time T); the output function <f> 
limits the value to [0,1]. 

Ri (T) is activity of the most plausible subreferant 
in Pi, defined as follows: 

Ri(T) = Si m (T), 

m = &rgmaXj{hij-Sij(T)}, 

where hij is thickness of the j-th subreferant of Pi . 
Sij (T) is the sum of weighted activity of the nodes 
referred in the j-th subreferant of Pi, defined as fol- 
lows: 

Sij(T) = ^^tijk-aijk{T), 

k 

where t^k is thickness of the fc-th link of sy, and 
iijk(T) is activity (at time T) of the node referred 
by the fc-th link of . 

R'i (T) is weighted activity of the nodes referred in 
the refere rv of Pi : 



Rip) =j2t'ik-< k (n 



2. Let h^ be 2m.; — 1— j. 
(Note that hn : hi m = 



2 : 1. 



where t' ik is thickness of the fc-th link of rj, and a' ik is 
activity (at time T) of the node referred by the fc-th 
link of r, . 
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